Probabilistic Binary-Mask Cocktail-Party Source Separation in a Convolutional Deep Neural Network
نویسنده
چکیده
Separation of competing speech is a key challenge in signal processing and a feat routinely performed by the human auditory brain. A long standing benchmark of the spectrogram approach to source separation is known as the ideal binary mask. Here, we train a convolutional deep neural network, on a twospeaker cocktail party problem, to make probabilistic predictions about binary masks. Our results approach ideal binary mask performance, illustrating that relatively simple deep neural networks are capable of robust binary mask prediction. We also illustrate the trade-off between prediction statistics and separation quality.
منابع مشابه
Deep Transform: Cocktail Party Source Separation via Complex Convolution in a Deep Neural Network
Convolutional deep neural networks (DNN) are state of the art in many engineering problems but have not yet addressed the issue of how to deal with complex spectrograms. Here, we use circular statistics to provide a convenient probabilistic estimate of spectrogram phase in a complex convolutional DNN. In a typical cocktail party source separation scenario, we trained a convolutional DNN to re-s...
متن کاملDeep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of vo...
متن کاملDeep Transform: Cocktail Party Source Separation via Probabilistic Re-Synthesis
In cocktail party listening scenarios, the human brain is able to separate competing speech signals. However, the signal processing implemented by the brain to perform cocktail party listening is not well understood. Here, we trained two separate convolutive autoencoder deep neural networks (DNN) to separate monaural and binaural mixtures of two concurrent speech streams. We then used these DNN...
متن کاملMonaural Audio Speaker Separation Using Source-Contrastive Estimation
We propose an algorithm to separate simultaneously speaking persons from each other, the “cocktail party problem”, using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. Such a vector space can embed empirically determined speaker characteristics and is optimized by distinguishing between speake...
متن کاملCombining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks
Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1503.06962 شماره
صفحات -
تاریخ انتشار 2015